Greedy phylogeny-based orthology assignment and its application to the evolutionary analysis of metabolic coupling

نویسنده

  • Sabine Thuß
چکیده

Orthologous proteins descend from a common ancestral protein via a speciation event and often keep their ancestral functions. Therefore, orthology assignment is often applied to identify gene content and functions in newly sequenced species. No commonly accepted gold standard exists so far for orthology assignment. One reason for this is a preference of different evolutionary mechanisms in different phylogenetic clades. Eukaryotic genomes often evolve via gene duplication, while LGT (Lateral Gene Transfer) is more frequent in prokaryotes. The development of orthology assignment methods is therefore often based on the research aim and requires more or less detailed resolution of different types of homology. In this work I developed phyloCOP (phylogeny-based Clusters of Orthologous Proteins), a new greedy phylogenyand reference-based orthology assignment method that detects transitive orthologous relationships in prokaryotes, while simultaneously excluding paralogy. PhyloCOP was designed to create orthologous clusters without one-to-many relations (paralogous genes) that can be directly used for function prediction and evolutionary studies. PhyloCOP provides customizable parameters to adjust the algorithm to the requirements of various datasets and research aims. The user defines the reference genome on which her or his comparative research is based. The degree of transitivity between orthologs within a cluster is also user-specified, which makes phyloCOP adjustable to prokaryotic datasets that include genomes with various phylogenetic distances. In order to evaluate phyloCOP, clusters generated from 14 and 539 prokaryotic genomes were compared to similar sequence similarity-based algorithms. PhyloCOP clusters that correspond to universally distributed Clusters of Orthologous Genes included genes from nearly all analyzed genomes, which is a proof for good orthology assignment quality. Metabolic networks consist of metabolites connected by reactions, which are catalyzed by enzymes. Complex network connections are resolved best by regarding simpler units within the system. Coupled reaction subsets, basic functional modules of metabolic networks, in which reactions are connected in a common anabolic, catabolic or transport pathway, are used in this work to get insights into the evolution of metabolic networks in prokaryotes. If metabolic network reactions and catalytic enzyme composition of the reference genome are established, metabolic network composition of other genomes can be resolved via transitive orthology prediction. I applied comparative analysis to enzymes that catalyze fully coupled reaction pairs to investigate metabolic network evolution using Escherichia coli K12 MG1655 as reference. Ancestral relations between 14 E. coli genomes were reconstructed from phyloCOP clusters and topologically displayed in a phylogenetic tree. Genomes were assigned to specific evolutionary times based on their last common ancestor with the reference genome. The existence of corresponding enzymes was checked at each ancestral time for each pair of coupled reaction enzymes. In order to resolve loss of reaction couplings and the occurrence of gene loss or LGT at specific evolutionary times, fractions of coupled and non-coupled enzyme pairs were calculated at each ancestral time point. I detected a correlation between gene loss and reaction coupling. All metabolic couplings turned out to be ancient and likely existed already in the common ancestor of the species analysed. However, there was a trend of increased loss of couplings in individual species with increasing phylogenetic distance. Previously documented gene loss in E. coli DH10B a substrain of E. coli K12 MG1655 was verified, which further supports the good quality of the clusters generated with phyloCOP. In order to get deeper insights into the evolution of metabolic coupling, further studies with larger datasets of more distantly related genomes are recommended.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Phylogeny-Based Benchmarking Test for Orthology Inference Reveals the Limitations of Function-Based Validation

Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable ...

متن کامل

Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...

متن کامل

Using Greedy Randomize Adaptive Search Procedure for solve the Quadratic Assignment Problem

  Greedy randomize adaptive search procedure is one of the repetitive meta-heuristic to solve combinatorial problem. In this procedure, each repetition includes two, construction and local search phase. A high quality feasible primitive answer is made in construction phase and is improved in the second phase with local search. The best answer result of iterations, declare as output. In this stu...

متن کامل

Phylogenetic Analysis of Beta-Glucanase Producing Actinomycetes Strain TBG-CH22 - A Comparison of Conventional and Molecular Morphometric Approach

Actinomycetes are inexhaustible producers of commercially valuable metabolites, are continually screened for beneficial compounds. The taxonomic and phylogenetic study of novel actinomycetes strains are mostly based on conventional methods and primary DNA structure of 16s rRNA. Although 16s rRNA sequence is well accepted in phylogeny studies, its secondary structures have not been widely used. ...

متن کامل

Assignment problem and its application in Nigerian institutions: Hungarian method approach

Assignment model is a powerful operations research techniques that can be used to solve assignment or allocation problem. This study applies the assignment model to the course allocation problem in Nigeria tertiary institution in order to maximize lecturers’ effectiveness. A well-structured questionnaire was used to obtain data from lecturers and solved with Hungarian method. The study revealed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011